RR9 Retina Dataset Integration

Author

Lauren Sanders, Jian Gong, Vaishnavi Nagesh

Digital Twin Project Description

Space biology confronts a critical obstacle: the challenge of incomplete data due to the logistical complexities and high costs of space missions. Addressing this issue, this research presents strategies that integrate AI and digital twin technology to overcome the limitations posed by sparse datasets in space biology research.

By presenting a cohesive strategy that combines synthetic data generation, automatic labeling, and advanced machine learning with digital twins, we showcase an application to the RR9 dataset at OSDR. This research aims to overcome the challenge of data scarcity in space biology, thereby forging a way to unlock insights into the potential of life beyond Earth.

RR9 Background

The Rodent Research 9 payload consisted of three space biology experiments designed to examine impacts of long-duration spaceflight on visual impairment and joint tissue degradation that affect astronauts.

Investigation Purpose Experiments
Investigation 1 Effects of microgravity on fluid shifts and increased fluid pressures that occur in the head. 1. To determine whether spaceflight on the ISS alters rodent basilar artery spontaneous tone, myogenic and KCl (Potassium Chloride)-evoked vasoconstriction, mechanical stiffness and gross structure.
2. To estimate whether spaceflight on the ISS alters the blood-brain barrier in rodents, as indicated by ultrastructural examination of the junctional complex of the cerebral capillary endothelium.
3. To determine whether spaceflight on the ISS alters rodent basal vein (inside cranium) and jugular vein (outside cranium) spontaneous tone, myogenic and KCl-evoked constriction, distension, and gross structure.
4. To determine whether spaceflight on the ISS alters the ability of the cervical lymphatics to modulate lymph flow, and thus, regulate cerebral fluid homeostasis.
Investigation 2 Impact of spaceflight on the vessels that supply blood to the eyes. 1. Define the relationships between spaceflight condition-induced oxidative stress in reactive oxygen species (ROS) expression and retinal vascular remodeling and BRB function in mice return to Earth alive.
2. Determine whether spaceflight condition-induced oxidative damage in retina is mediated through photoreceptor mitochondrial ROS production.
Investigation 3 Extent of knee and hip joint degradation caused by prolonged exposure to weightlessness. 1. Determine the extent of knee and hip joint degradation in mice after 30 days of spaceflight on the ISS.
2. Use the DigiGait System to assess gait patterns before and after returning from the ISS.

We are interested in the Retinal data and all things related to Eye. So, all the experiments related to Investigation 1 and 2 will be studied here. Below is the table of all OSD identifiers related to above investigations obtained from https://osdr.nasa.gov/bio/repo/data/payloads/RR-9

Identifier Title Factors Assay Types
OSD-557 Spaceflight influences gene expression, photoreceptor integrity, and oxidative stress related damage in the murine retina (RR-9) Spaceflight Bone Microstructure
Molecular Cellular Imaging
histology
OSD-568 Characterization of mouse ocular responses (Microscopy) to a 35-day (RR-9) spaceflight mission: Evidence of blood-retinal barrier disruption and ocular adaptations Spaceflight Molecular Cellular Imaging
OSD-715 Characterization of mouse ocular response to a 35-day spaceflight mission: Evidence of blood-retinal barrier disruption and ocular adaptations - Proteomics data Spaceflight protein expression profiling
OSD-255 Spaceflight influences gene expression, photoreceptor integrity, and oxidative stress-related damage in the murine retina Spaceflight transcription profiling
OS-140 Space Flight Environment Induces Remodeling of Vascular Network and Glia-Vascular Communication in Mouse Retina Spaceflight
OSD-583 Characterization of mouse ocular responses (intraocular pressure) to a 35-day (RR-9) spaceflight mission: Evidence of blood-retinal barrier disruption and ocular adaptations Spaceflight Tonometry

The purpose of this notebook is to combine all retina data from the Rodent Research 9 (RR9) mission from the NASA Open Science Data Repository, perform exploratory data analysis, impute missing data and train a digital twin.

Original Author: Lauren Sanders

Additional Author(s): Jian Gong, Vaishnavi Nagesh

Load Data

We are downloading all the relevant data as shown in the the table below.

Data Type Data Links
Loading ITables v2.2.5 from the internet... (need help?)

Data Exploration and Validation

The below table shows number of features in each dataset that constitute the RR9 multi-modal data. This is useful in identifying the maximum number of PCA components required to explain the cumulative variance in the dataset.

Data Rows X Features
Loading ITables v2.2.5 from the internet... (need help?)

Summary of the Merged Data frame

PCA on Different Categories of Datasets

PCA on RNASeq

PCA on RNASeq for Only Predictive Genes for Phenotype

Code
genes_predictive_of_phenotypes = [
    'ENSMUSG00000021185',
    'ENSMUSG00000021432',
    'ENSMUSG00000021712',
    'ENSMUSG00000023484',
    'ENSMUSG00000025484',
    'ENSMUSG00000026768',
    'ENSMUSG00000028184',
    'ENSMUSG00000028423',
    'ENSMUSG00000029499',
    'ENSMUSG00000036636',
    'ENSMUSG00000039994',
    'ENSMUSG00000041685',
    'ENSMUSG00000042190',
    'ENSMUSG00000045318',
    'ENSMUSG00000050538',
    'ENSMUSG00000052373',
    'ENSMUSG00000068250',
    'ENSMUSG00000068394',
    'ENSMUSG00000070822',
    'ENSMUSG00000073879',
    'ENSMUSG00000084408',
    'ENSMUSG00000097061',
    'ENSMUSG00000097180',
    'ENSMUSG00000106147',
    'ENSMUSG00000107195',
    'ENSMUSG00000110357',
]
set(genes_predictive_of_phenotypes).issubset(rnaseq.columns.to_list())

# Filter the RNASeq dataset to include only the genes predictive of phenotypes
filtered_rnaseq = rnaseq[genes_predictive_of_phenotypes]

# Perform PCA on the filtered dataset
phenotype_rnaseq_pca_df, scatter_plt = perform_pca(rr9_all_df, dataset_name=filtered_rnaseq)
scatter_plt.show()

PCA on Proteomics

PCA on TUNEL Assay

PCA on HNE Immunostaining Micoscopy

PCA on Micro CT

PCA on Combined Immunostaining Micoscopy data from Zo-1, PECAM, PNA and HNE

Zo-1, PECAM, PNA and HNE are all immunostaining microscopy. It would be useful to see if combining them all helps in better separation between the groups.

From the PCA analysis of different datasets, it seems like there is fair separation betwee flight and other groups. However, there isn’t sufficient data to show separation between GC, Viv and CC groups.

Data Analysis Correlation Between HNE and RNASeq

HNE Immunostaining microscopy has data across four different groups (F, GC, Viv, CC2) and RNASeq is available for two different groups (F and GC). Among these F15-F20 and GC15-20 have data for both HNE and RNASeq.

To be able to anchor the imputation of RNASeq data onto biological characteristic, the correlation between RNASeq and HNE needs to be determined.

Code
def analyze_correlation(dataset_name, gene_list, rr9_all_df):
    """
    Analyze the correlation between a given dataset and a list of genes.

    Parameters:
    - dataset_name: DataFrame, the dataset to analyze (e.g., TUNEL, HNE, etc.)
    - gene_list: list, the list of genes to analyze
    - rr9_all_df: DataFrame, the merged RR9 dataset containing all data

    Returns:
    - None, displays heatmaps for correlation matrices
    """
    # Select relevant columns
    rna_cols_to_select = gene_list + ['Source Name', 'Group']
    dataset_cols_to_select = dataset_name.columns.tolist() + ['Source Name', 'Group']
    
    # Filter and drop missing values
    rnaseq_filtered = rr9_all_df[rna_cols_to_select].dropna(how='any').reset_index(drop=True)
    dataset_filtered = rr9_all_df[dataset_cols_to_select].dropna(how='any').reset_index(drop=True)
    
    # Merge the two datasets
    combined_df = pd.merge(dataset_filtered, rnaseq_filtered, on=['Source Name', 'Group'], how='inner')
    groups = combined_df['Group'].unique()
    
    # Iterate over each group and compute the correlation matrix
    for group in groups:
        # Filter data for the current group
        group_indices = combined_df[combined_df['Group'] == group].index
        dataset_group = dataset_filtered.loc[group_indices]
        rnaseq_group = rnaseq_filtered.loc[group_indices]
        
        # Concatenate the two dataframes along the columns
        combined_group_df = pd.concat([dataset_group, rnaseq_group], axis=1)
        
        # Compute the correlation matrix
        correlation_matrix = combined_group_df.corr(numeric_only=True)
        
        # Plot the heatmap
        sns.heatmap(
            correlation_matrix,
            cmap='coolwarm',
            annot=False,
            cbar_kws={'label': 'Correlation Coefficient'}
        )
        plt.title(f"Correlation Matrix for Group: {group}")
        plt.show()

analyze_correlation(hne, found_genes, rr9_all_df=rr9_all_df)
analyze_correlation(hne, genes_predictive_of_phenotypes, rr9_all_df=rr9_all_df)

Data Analysis Correlation Between TUNEL Assay and RNASeq

The reason to select TUNEL for correlation analysis is that TUNEL assay points seem to separate out cleaner on the PCA plots than HNE data points between different groups. TUNEL Assay has data across four different groups (F, GC, Viv, CC2) and RNASeq is available for two different groups (F and GC). Among these F15-F20 and GC15-20 have data for both TUNEL and RNASeq.

To be able to anchor the imputation of RNASeq data onto biological characteristic, the correlation between RNASeq and TUNEL needs to be determined.

Code
analyze_correlation(tunel, found_genes, rr9_all_df=rr9_all_df)
analyze_correlation(tunel, genes_predictive_of_phenotypes, rr9_all_df=rr9_all_df)

Imputation of Relevant Genes from Tunel data

First step is to see how many genes of the interested gene list do not have data. Imputation will be done in two groups: F(light) and not F(light). Samples F9 and F11 have RNASeq values, but not TUNEL assay values.

Code
flight_relevant_genes = rr9_all_df[rr9_all_df['Group'] == 'F'][found_genes+genes_predictive_of_phenotypes + ['Source Name', 'Group']]

non_flight_relevant_genes = rr9_all_df[rr9_all_df['Group'] != 'F'][found_genes+genes_predictive_of_phenotypes + ['Source Name', 'Group']]

flight_tunel_data = rr9_all_df[rr9_all_df['Group'] == 'F'][tunel.columns.to_list() + ['Source Name', 'Group']]
non_flight_tunel_data = rr9_all_df[rr9_all_df['Group'] != 'F'][tunel.columns.to_list() + ['Source Name', 'Group']]

merged_flight_data = pd.merge(
    flight_relevant_genes,
    flight_tunel_data,
    on=['Source Name', 'Group'],
    how='outer'
)

merged_non_flight_data = pd.merge(
    non_flight_relevant_genes,
    non_flight_tunel_data,
    on=['Source Name', 'Group'],
    how='outer'
)

Establishing a Classifier to Validate Imputation for TUNEL and RNASeq Datasets To validate the effectiveness of imputation, we propose building a binary classifier to distinguish between flight and non-flight samples using the complete RNASeq and TUNEL datasets. After imputing missing values in the TUNEL and RNASeq datasets, we will train the same classifier on the imputed data and compare its performance metrics (e.g., accuracy, precision, recall, F1-score) with those obtained from the complete datasets. This comparison will help assess whether the imputation process preserves the integrity and predictive power of the data.

KNN Imputer

Flight Data

Code
imp_knn5 = KNNImputer(n_neighbors=5, weights='distance')
imp_df_knn5 = imp_knn5.fit_transform(merged_flight_data.drop(columns=['Source Name', 'Group']))
imp_df_knn5 = pd.DataFrame(imp_df_knn5, columns=merged_flight_data.drop(columns=['Source Name', 'Group']).columns.to_list())
imp_df_knn5['Source Name'] = merged_flight_data['Source Name']
imp_df_knn5['Group'] = merged_flight_data['Group']


imp_knn2 = KNNImputer(n_neighbors=2, weights='distance')
imp_df_knn2 = imp_knn2.fit_transform(merged_flight_data.drop(columns=['Source Name', 'Group']))
imp_df_knn2 = pd.DataFrame(imp_df_knn2, columns=merged_flight_data.drop(columns=['Source Name', 'Group']).columns.to_list())
imp_df_knn2['Source Name'] = merged_flight_data['Source Name']
imp_df_knn2['Group'] = merged_flight_data['Group']


fig = px.imshow(imp_df_knn5.corr(numeric_only=True),  text_auto=True)
fig.show()

fig = px.imshow(imp_df_knn2.corr(numeric_only=True),  text_auto=True)
fig.show()
corr_knn_2_matrix = imp_df_knn2.corr(numeric_only=True)
corr_knn_2_df = corr_knn_2_matrix.unstack().reset_index()
corr_knn_2_df.rename(columns={'level_0': 'para_1', 'level_1':'para_2',
                          0:'corr_coef_knn'}, inplace=True)
/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:205: RuntimeWarning:

divide by zero encountered in matmul

/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:205: RuntimeWarning:

overflow encountered in matmul

/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:205: RuntimeWarning:

invalid value encountered in matmul

/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:205: RuntimeWarning:

divide by zero encountered in matmul

/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:205: RuntimeWarning:

overflow encountered in matmul

/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:205: RuntimeWarning:

invalid value encountered in matmul

Non-Flight Data

Code
imp_knn5 = KNNImputer(n_neighbors=5, weights='distance')
imp_df_knn5 = imp_knn5.fit_transform(merged_non_flight_data.drop(columns=['Source Name', 'Group']))
imp_df_knn5 = pd.DataFrame(imp_df_knn5, columns=merged_non_flight_data.drop(columns=['Source Name', 'Group']).columns.to_list())
imp_df_knn5['Source Name'] = merged_non_flight_data['Source Name']
imp_df_knn5['Group'] = merged_non_flight_data['Group']


imp_knn2 = KNNImputer(n_neighbors=2, weights='distance')
imp_df_knn2 = imp_knn2.fit_transform(merged_non_flight_data.drop(columns=['Source Name', 'Group']))
imp_df_knn2 = pd.DataFrame(imp_df_knn2, columns=merged_non_flight_data.drop(columns=['Source Name', 'Group']).columns.to_list())
imp_df_knn2['Source Name'] = merged_non_flight_data['Source Name']
imp_df_knn2['Group'] = merged_non_flight_data['Group']


fig = px.imshow(imp_df_knn5.corr(numeric_only=True),  text_auto=True)
fig.show()

fig = px.imshow(imp_df_knn2.corr(numeric_only=True),  text_auto=True)
fig.show()
/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:205: RuntimeWarning:

divide by zero encountered in matmul

/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:205: RuntimeWarning:

overflow encountered in matmul

/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:205: RuntimeWarning:

invalid value encountered in matmul

/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:205: RuntimeWarning:

divide by zero encountered in matmul

/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:205: RuntimeWarning:

overflow encountered in matmul

/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:205: RuntimeWarning:

invalid value encountered in matmul

Random Sample Imputer

This is used in cases where there is more than 25-30% of data to be imputed and is also fast compared to others. ### Flight Data

Code
rsi = RandomSampleImputer()
rsi_df = rsi.fit_transform(merged_flight_data.drop(columns=['Source Name', 'Group']))
rsi_df = pd.DataFrame(rsi_df, columns=merged_flight_data.drop(columns=['Source Name', 'Group']).columns.to_list())
rsi_df['Source Name'] = merged_flight_data['Source Name']
rsi_df['Group'] = merged_flight_data['Group']


fig = px.imshow(rsi_df.corr(numeric_only=True),  text_auto=True)
fig.show()

corr_rsi_matrix = rsi_df.corr(numeric_only=True)
corr_rsi_df = corr_rsi_matrix.unstack().reset_index()
corr_rsi_df.rename(columns={'level_0': 'para_1', 'level_1':'para_2',
                          0:'corr_coef_rsi'}, inplace=True)

Non-Flight Data

Code
rsi = RandomSampleImputer()
rsi_df = rsi.fit_transform(merged_non_flight_data.drop(columns=['Source Name', 'Group']))
rsi_df = pd.DataFrame(rsi_df, columns=merged_non_flight_data.drop(columns=['Source Name', 'Group']).columns.to_list())
rsi_df['Source Name'] = merged_non_flight_data['Source Name']
rsi_df['Group'] = merged_non_flight_data['Group']
fig = px.imshow(rsi_df.corr(numeric_only=True),  text_auto=True)
fig.show()

Multiple Imputation by Chained Equation

One can impute missing values by predicting them using other features from the dataset.

The MICE or ‘Multiple Imputations by Chained Equations’, aka, ‘Fully Conditional Specification’ is a popular approach to do this.

Here is a quick intuition (not the exact algorithm) Image

  • You basically take the variable that contains missing values as a response ‘Y’ and other variables as predictors ‘X’.

  • Build a model with rows where Y is not missing.

  • Then predict the missing observations.

Do this multiple times by doing random draws of the data and taking the mean of the predictions.

Flight Data

Code
lgbr = HistGradientBoostingRegressor(random_state=2)
itera_imp = IterativeImputer(random_state=2, initial_strategy='median', estimator=lgbr, max_iter=10, verbose=2)
itera_imp.fit(merged_flight_data.drop(columns=['Source Name', 'Group']))
df_imputed = itera_imp.transform(merged_flight_data.drop(columns=['Source Name', 'Group']))
df_imputed = pd.DataFrame(df_imputed, columns=merged_flight_data.drop(columns=['Source Name', 'Group']).columns.to_list())
df_imputed['Source Name'] = merged_flight_data['Source Name']
df_imputed['Group'] = merged_flight_data['Group']

corr_matrix = df_imputed.drop(columns=['Source Name','Group']).corr()
# Plot the heatmap
# fig, ax = plt.subplots(figsize=(10, 10))
sns.heatmap(corr_matrix, cmap='coolwarm')
plt.show()

corr_mice_df = corr_matrix.unstack().reset_index()
corr_mice_df.rename(columns={'level_0': 'para_1', 'level_1':'para_2',
                          0:'corr_coef_mice_boost'}, inplace=True)
[IterativeImputer] Completing matrix with shape (20, 82)
[IterativeImputer] Ending imputation round 1/10, elapsed time 1.60
[IterativeImputer] Change: 1397.3284127942497, scaled tolerance: 65.6542740629091 
[IterativeImputer] Ending imputation round 2/10, elapsed time 3.10
[IterativeImputer] Change: 0.0, scaled tolerance: 65.6542740629091 
[IterativeImputer] Early stopping criterion reached.
[IterativeImputer] Completing matrix with shape (20, 82)
[IterativeImputer] Ending imputation round 1/2, elapsed time 0.25
[IterativeImputer] Ending imputation round 2/2, elapsed time 0.50

Non-Flight Data

Code
lgbr = HistGradientBoostingRegressor(random_state=2)
itera_imp = IterativeImputer(random_state=2, initial_strategy='median', estimator=lgbr, max_iter=10, verbose=2)
itera_imp.fit(merged_non_flight_data.drop(columns=['Source Name', 'Group']))
df_imputed = itera_imp.transform(merged_non_flight_data.drop(columns=['Source Name', 'Group']))
df_imputed = pd.DataFrame(df_imputed, columns=merged_non_flight_data.drop(columns=['Source Name', 'Group']).columns.to_list())
df_imputed['Source Name'] = merged_non_flight_data['Source Name']
df_imputed['Group'] = merged_non_flight_data['Group']
corr_matrix = df_imputed.drop(columns=['Source Name','Group']).corr()

sns.heatmap(corr_matrix, cmap='coolwarm')
plt.show()
[IterativeImputer] Completing matrix with shape (80, 82)
[IterativeImputer] Ending imputation round 1/10, elapsed time 1.56
[IterativeImputer] Change: 1308.8961960633826, scaled tolerance: 57.9240767404965 
[IterativeImputer] Ending imputation round 2/10, elapsed time 3.25
[IterativeImputer] Change: 0.0, scaled tolerance: 57.9240767404965 
[IterativeImputer] Early stopping criterion reached.
[IterativeImputer] Completing matrix with shape (80, 82)
[IterativeImputer] Ending imputation round 1/2, elapsed time 0.26
[IterativeImputer] Ending imputation round 2/2, elapsed time 0.51

MICE with Bagging Regressor

Flight Data

Code
bagger = BaggingRegressor(random_state=2)
itera_bagger = IterativeImputer(random_state=2, initial_strategy='median', estimator=bagger, max_iter=50, verbose=2, tol=0.01)
itera_bagger.fit(merged_flight_data.drop(columns=['Source Name', 'Group']))
df_bag_imputed_flight = itera_bagger.transform(merged_flight_data.drop(columns=['Source Name', 'Group']))
df_bag_imputed_flight = pd.DataFrame(df_bag_imputed_flight, columns=merged_flight_data.drop(columns=['Source Name', 'Group']).columns.to_list())
df_bag_imputed_flight['Source Name'] = merged_flight_data['Source Name']
df_bag_imputed_flight['Group'] = merged_flight_data['Group']

corr_matrix = df_bag_imputed_flight.drop(columns=['Source Name','Group']).corr()
sns.heatmap(corr_matrix, cmap='coolwarm')
plt.show()

corr_mice_bag_df = corr_matrix.unstack().reset_index()
corr_mice_bag_df.rename(columns={'level_0': 'para_1', 'level_1':'para_2',
                          0:'corr_coef_mice_bag'}, inplace=True)
[IterativeImputer] Completing matrix with shape (20, 82)
[IterativeImputer] Ending imputation round 1/50, elapsed time 0.37
[IterativeImputer] Change: 1940.5475771234073, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 2/50, elapsed time 0.73
[IterativeImputer] Change: 2304.496631229626, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 3/50, elapsed time 1.09
[IterativeImputer] Change: 1625.7635839128618, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 4/50, elapsed time 1.45
[IterativeImputer] Change: 1588.4052974920094, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 5/50, elapsed time 1.82
[IterativeImputer] Change: 1499.2879428183173, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 6/50, elapsed time 2.34
[IterativeImputer] Change: 3676.7275189820207, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 7/50, elapsed time 2.70
[IterativeImputer] Change: 2766.8676557055273, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 8/50, elapsed time 3.07
[IterativeImputer] Change: 1374.8342667369056, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 9/50, elapsed time 3.43
[IterativeImputer] Change: 1022.7756242835043, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 10/50, elapsed time 3.80
[IterativeImputer] Change: 2894.733169946128, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 11/50, elapsed time 4.16
[IterativeImputer] Change: 1645.066090994783, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 12/50, elapsed time 4.52
[IterativeImputer] Change: 1030.3675564056448, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 13/50, elapsed time 4.89
[IterativeImputer] Change: 1359.6359505201567, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 14/50, elapsed time 5.25
[IterativeImputer] Change: 1267.411820863997, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 15/50, elapsed time 5.61
[IterativeImputer] Change: 2170.6640078101373, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 16/50, elapsed time 5.97
[IterativeImputer] Change: 813.7176539435219, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 17/50, elapsed time 6.34
[IterativeImputer] Change: 2370.635519511398, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 18/50, elapsed time 6.70
[IterativeImputer] Change: 2034.8893530764412, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 19/50, elapsed time 7.06
[IterativeImputer] Change: 922.9755206465899, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 20/50, elapsed time 7.42
[IterativeImputer] Change: 2669.106416730059, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 21/50, elapsed time 7.78
[IterativeImputer] Change: 2014.9061739975687, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 22/50, elapsed time 8.15
[IterativeImputer] Change: 1204.47742090189, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 23/50, elapsed time 8.51
[IterativeImputer] Change: 1638.5018332287048, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 24/50, elapsed time 8.86
[IterativeImputer] Change: 826.3316725026752, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 25/50, elapsed time 9.22
[IterativeImputer] Change: 1372.8200024285545, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 26/50, elapsed time 9.58
[IterativeImputer] Change: 1658.5791647704775, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 27/50, elapsed time 9.94
[IterativeImputer] Change: 3211.0818661058674, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 28/50, elapsed time 10.31
[IterativeImputer] Change: 1718.1069301676891, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 29/50, elapsed time 10.67
[IterativeImputer] Change: 2452.6308873171165, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 30/50, elapsed time 11.03
[IterativeImputer] Change: 2619.718640221444, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 31/50, elapsed time 11.39
[IterativeImputer] Change: 1559.8835280094163, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 32/50, elapsed time 11.75
[IterativeImputer] Change: 3846.7056401012596, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 33/50, elapsed time 12.11
[IterativeImputer] Change: 2406.464190067421, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 34/50, elapsed time 12.47
[IterativeImputer] Change: 1061.3304561032555, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 35/50, elapsed time 12.83
[IterativeImputer] Change: 1933.712403397467, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 36/50, elapsed time 13.19
[IterativeImputer] Change: 2490.9413517082144, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 37/50, elapsed time 13.55
[IterativeImputer] Change: 1155.1005615916195, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 38/50, elapsed time 13.91
[IterativeImputer] Change: 1304.0341273034649, scaled tolerance: 656.542740629091 
[IterativeImputer] Ending imputation round 39/50, elapsed time 14.27
[IterativeImputer] Change: 576.7314548030308, scaled tolerance: 656.542740629091 
[IterativeImputer] Early stopping criterion reached.
[IterativeImputer] Completing matrix with shape (20, 82)
[IterativeImputer] Ending imputation round 1/39, elapsed time 0.02
[IterativeImputer] Ending imputation round 2/39, elapsed time 0.04
[IterativeImputer] Ending imputation round 3/39, elapsed time 0.06
[IterativeImputer] Ending imputation round 4/39, elapsed time 0.09
[IterativeImputer] Ending imputation round 5/39, elapsed time 0.11
[IterativeImputer] Ending imputation round 6/39, elapsed time 0.13
[IterativeImputer] Ending imputation round 7/39, elapsed time 0.15
[IterativeImputer] Ending imputation round 8/39, elapsed time 0.17
[IterativeImputer] Ending imputation round 9/39, elapsed time 0.19
[IterativeImputer] Ending imputation round 10/39, elapsed time 0.21
[IterativeImputer] Ending imputation round 11/39, elapsed time 0.23
[IterativeImputer] Ending imputation round 12/39, elapsed time 0.26
[IterativeImputer] Ending imputation round 13/39, elapsed time 0.28
[IterativeImputer] Ending imputation round 14/39, elapsed time 0.30
[IterativeImputer] Ending imputation round 15/39, elapsed time 0.32
[IterativeImputer] Ending imputation round 16/39, elapsed time 0.34
[IterativeImputer] Ending imputation round 17/39, elapsed time 0.36
[IterativeImputer] Ending imputation round 18/39, elapsed time 0.38
[IterativeImputer] Ending imputation round 19/39, elapsed time 0.40
[IterativeImputer] Ending imputation round 20/39, elapsed time 0.43
[IterativeImputer] Ending imputation round 21/39, elapsed time 0.45
[IterativeImputer] Ending imputation round 22/39, elapsed time 0.47
[IterativeImputer] Ending imputation round 23/39, elapsed time 0.49
[IterativeImputer] Ending imputation round 24/39, elapsed time 0.51
[IterativeImputer] Ending imputation round 25/39, elapsed time 0.53
[IterativeImputer] Ending imputation round 26/39, elapsed time 0.55
[IterativeImputer] Ending imputation round 27/39, elapsed time 0.58
[IterativeImputer] Ending imputation round 28/39, elapsed time 0.60
[IterativeImputer] Ending imputation round 29/39, elapsed time 0.62
[IterativeImputer] Ending imputation round 30/39, elapsed time 0.64
[IterativeImputer] Ending imputation round 31/39, elapsed time 0.66
[IterativeImputer] Ending imputation round 32/39, elapsed time 0.68
[IterativeImputer] Ending imputation round 33/39, elapsed time 0.70
[IterativeImputer] Ending imputation round 34/39, elapsed time 0.72
[IterativeImputer] Ending imputation round 35/39, elapsed time 0.75
[IterativeImputer] Ending imputation round 36/39, elapsed time 0.77
[IterativeImputer] Ending imputation round 37/39, elapsed time 0.79
[IterativeImputer] Ending imputation round 38/39, elapsed time 0.81
[IterativeImputer] Ending imputation round 39/39, elapsed time 0.83

Non-Flight Data

Code
bagger = BaggingRegressor(random_state=2)
itera_bagger = IterativeImputer(random_state=2, initial_strategy='median', estimator=bagger, max_iter=50, verbose=2, tol=0.01)
itera_bagger.fit(merged_non_flight_data.drop(columns=['Source Name', 'Group']))
df_bag_imputed_non_flight = itera_bagger.transform(merged_non_flight_data.drop(columns=['Source Name', 'Group']))
df_bag_imputed_non_flight = pd.DataFrame(df_bag_imputed_non_flight, columns=merged_non_flight_data.drop(columns=['Source Name', 'Group']).columns.to_list())
df_bag_imputed_non_flight['Source Name'] = merged_non_flight_data['Source Name']
df_bag_imputed_non_flight['Group'] = merged_non_flight_data['Group']

corr_matrix = df_bag_imputed_non_flight.drop(columns=['Source Name','Group']).corr()
sns.heatmap(corr_matrix, cmap='coolwarm')
plt.show()
[IterativeImputer] Completing matrix with shape (80, 82)
[IterativeImputer] Ending imputation round 1/50, elapsed time 0.37
[IterativeImputer] Change: 2569.9775989559485, scaled tolerance: 579.240767404965 
[IterativeImputer] Ending imputation round 2/50, elapsed time 0.74
[IterativeImputer] Change: 1738.3965212773273, scaled tolerance: 579.240767404965 
[IterativeImputer] Ending imputation round 3/50, elapsed time 1.10
[IterativeImputer] Change: 1784.599674194284, scaled tolerance: 579.240767404965 
[IterativeImputer] Ending imputation round 4/50, elapsed time 1.59
[IterativeImputer] Change: 1232.502256362552, scaled tolerance: 579.240767404965 
[IterativeImputer] Ending imputation round 5/50, elapsed time 1.95
[IterativeImputer] Change: 1128.665506522908, scaled tolerance: 579.240767404965 
[IterativeImputer] Ending imputation round 6/50, elapsed time 2.32
[IterativeImputer] Change: 957.8457386847183, scaled tolerance: 579.240767404965 
[IterativeImputer] Ending imputation round 7/50, elapsed time 2.69
[IterativeImputer] Change: 1102.6736093008915, scaled tolerance: 579.240767404965 
[IterativeImputer] Ending imputation round 8/50, elapsed time 3.06
[IterativeImputer] Change: 942.8626478046449, scaled tolerance: 579.240767404965 
[IterativeImputer] Ending imputation round 9/50, elapsed time 3.42
[IterativeImputer] Change: 1055.5969872705493, scaled tolerance: 579.240767404965 
[IterativeImputer] Ending imputation round 10/50, elapsed time 3.79
[IterativeImputer] Change: 1145.3163537970627, scaled tolerance: 579.240767404965 
[IterativeImputer] Ending imputation round 11/50, elapsed time 4.16
[IterativeImputer] Change: 881.029980170526, scaled tolerance: 579.240767404965 
[IterativeImputer] Ending imputation round 12/50, elapsed time 4.53
[IterativeImputer] Change: 863.8717239109001, scaled tolerance: 579.240767404965 
[IterativeImputer] Ending imputation round 13/50, elapsed time 4.90
[IterativeImputer] Change: 1049.9249106024458, scaled tolerance: 579.240767404965 
[IterativeImputer] Ending imputation round 14/50, elapsed time 5.27
[IterativeImputer] Change: 1433.3014110197973, scaled tolerance: 579.240767404965 
[IterativeImputer] Ending imputation round 15/50, elapsed time 5.63
[IterativeImputer] Change: 1522.1510364979663, scaled tolerance: 579.240767404965 
[IterativeImputer] Ending imputation round 16/50, elapsed time 6.00
[IterativeImputer] Change: 1520.6533004262951, scaled tolerance: 579.240767404965 
[IterativeImputer] Ending imputation round 17/50, elapsed time 6.37
[IterativeImputer] Change: 1465.349631355589, scaled tolerance: 579.240767404965 
[IterativeImputer] Ending imputation round 18/50, elapsed time 6.74
[IterativeImputer] Change: 944.4455327277362, scaled tolerance: 579.240767404965 
[IterativeImputer] Ending imputation round 19/50, elapsed time 7.10
[IterativeImputer] Change: 839.1400420875864, scaled tolerance: 579.240767404965 
[IterativeImputer] Ending imputation round 20/50, elapsed time 7.47
[IterativeImputer] Change: 786.7904174031114, scaled tolerance: 579.240767404965 
[IterativeImputer] Ending imputation round 21/50, elapsed time 7.84
[IterativeImputer] Change: 1100.4747530937366, scaled tolerance: 579.240767404965 
[IterativeImputer] Ending imputation round 22/50, elapsed time 8.21
[IterativeImputer] Change: 1049.3007503154654, scaled tolerance: 579.240767404965 
[IterativeImputer] Ending imputation round 23/50, elapsed time 8.58
[IterativeImputer] Change: 1205.8009388274622, scaled tolerance: 579.240767404965 
[IterativeImputer] Ending imputation round 24/50, elapsed time 8.94
[IterativeImputer] Change: 1278.3356786001968, scaled tolerance: 579.240767404965 
[IterativeImputer] Ending imputation round 25/50, elapsed time 9.31
[IterativeImputer] Change: 1139.744734496802, scaled tolerance: 579.240767404965 
[IterativeImputer] Ending imputation round 26/50, elapsed time 9.69
[IterativeImputer] Change: 494.5048673581032, scaled tolerance: 579.240767404965 
[IterativeImputer] Early stopping criterion reached.
[IterativeImputer] Completing matrix with shape (80, 82)
[IterativeImputer] Ending imputation round 1/26, elapsed time 0.03
[IterativeImputer] Ending imputation round 2/26, elapsed time 0.06
[IterativeImputer] Ending imputation round 3/26, elapsed time 0.08
[IterativeImputer] Ending imputation round 4/26, elapsed time 0.11
[IterativeImputer] Ending imputation round 5/26, elapsed time 0.14
[IterativeImputer] Ending imputation round 6/26, elapsed time 0.16
[IterativeImputer] Ending imputation round 7/26, elapsed time 0.19
[IterativeImputer] Ending imputation round 8/26, elapsed time 0.21
[IterativeImputer] Ending imputation round 9/26, elapsed time 0.24
[IterativeImputer] Ending imputation round 10/26, elapsed time 0.26
[IterativeImputer] Ending imputation round 11/26, elapsed time 0.29
[IterativeImputer] Ending imputation round 12/26, elapsed time 0.32
[IterativeImputer] Ending imputation round 13/26, elapsed time 0.34
[IterativeImputer] Ending imputation round 14/26, elapsed time 0.37
[IterativeImputer] Ending imputation round 15/26, elapsed time 0.39
[IterativeImputer] Ending imputation round 16/26, elapsed time 0.42
[IterativeImputer] Ending imputation round 17/26, elapsed time 0.44
[IterativeImputer] Ending imputation round 18/26, elapsed time 0.47
[IterativeImputer] Ending imputation round 19/26, elapsed time 0.49
[IterativeImputer] Ending imputation round 20/26, elapsed time 0.52
[IterativeImputer] Ending imputation round 21/26, elapsed time 0.55
[IterativeImputer] Ending imputation round 22/26, elapsed time 0.57
[IterativeImputer] Ending imputation round 23/26, elapsed time 0.60
[IterativeImputer] Ending imputation round 24/26, elapsed time 0.62
[IterativeImputer] Ending imputation round 25/26, elapsed time 0.65
[IterativeImputer] Ending imputation round 26/26, elapsed time 0.67

Checking The Correlations

Flight Data

SignificanceResult(statistic=np.float64(0.986830117455662), pvalue=np.float64(0.0))
SignificanceResult(statistic=np.float64(0.6872370858719213), pvalue=np.float64(0.0))
SignificanceResult(statistic=np.float64(0.9956665262301658), pvalue=np.float64(0.0))
SignificanceResult(statistic=np.float64(0.979579624050381), pvalue=np.float64(0.0))

SVM for Validating Imputations

Code
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report, accuracy_score

def train_svm_classifier(data, label_col='Group'):
    """
    Train an SVM classifier to distinguish between flight and non-flight samples.
    """
    # Prepare features and labels
    X = data.drop(columns=['Source Name', label_col])
    y = data[label_col].apply(lambda x: 1 if x == 'F' else 0)  # Binary classification: Flight (1) vs Non-Flight (0)

    # Split into train and test sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

    # Train the SVM classifier
    svm = SVC(kernel='linear', random_state=42)
    svm.fit(X_train, y_train)

    # Evaluate the classifier
    y_pred = svm.predict(X_test)
    print("Classification Report:")
    print(classification_report(y_test, y_pred))
    print("Accuracy:", accuracy_score(y_test, y_pred))

    return svm
Code
# Train and evaluate SVM on RNASeq staining data
print("SVM on RNA Seq Data:")
rnaseq_data = rr9_all_df[['Source Name', 'Group'] + rnaseq.columns.tolist()]

# Drop rows with NaN values
rnaseq_data_cleaned = rnaseq_data.dropna()

train_svm_classifier(rnaseq_data_cleaned)
SVM on RNA Seq Data:
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00         3
           1       1.00      1.00      1.00         2

    accuracy                           1.00         5
   macro avg       1.00      1.00      1.00         5
weighted avg       1.00      1.00      1.00         5

Accuracy: 1.0
SVC(kernel='linear', random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Code
# Train and evaluate SVM on HNE staining data
print("SVM on HNE Staining Data:")
hne_data = rr9_all_df[['Source Name', 'Group'] + hne.columns.tolist()]

# Drop rows with NaN values
hne_data_cleaned = hne_data.dropna()

train_svm_classifier(hne_data_cleaned)
SVM on HNE Staining Data:
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00         5
           1       1.00      1.00      1.00         2

    accuracy                           1.00         7
   macro avg       1.00      1.00      1.00         7
weighted avg       1.00      1.00      1.00         7

Accuracy: 1.0
SVC(kernel='linear', random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Code
# Train and evaluate SVM on TUNEL data
print("SVM on HNE Staining Data:")
tunel_data = rr9_all_df[['Source Name', 'Group'] + tunel.columns.tolist()]

# Drop rows with NaN values
tunel_data_cleaned = tunel_data.dropna()

train_svm_classifier(tunel_data_cleaned)
SVM on HNE Staining Data:
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00         5
           1       1.00      1.00      1.00         2

    accuracy                           1.00         7
   macro avg       1.00      1.00      1.00         7
weighted avg       1.00      1.00      1.00         7

Accuracy: 1.0
SVC(kernel='linear', random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

::: {#10b55d95 .cell execution_count=34}
``` {.python .cell-code}
# Train and evaluate SVM on MICE with Bagging Regressor-imputed data
print("SVM on MICE with Bagging Regressor-Imputed Data:")
mice_bag_imputed_data = pd.concat([df_bag_imputed_flight, df_bag_imputed_non_flight], axis=0)

# Drop rows with NaN values in the 'Group' column
mice_bag_imputed_data = mice_bag_imputed_data.dropna(subset=['Group'])

train_svm_classifier(mice_bag_imputed_data)
SVM on MICE with Bagging Regressor-Imputed Data:
Classification Report:
              precision    recall  f1-score   support

           0       0.96      0.96      0.96        24
           1       0.83      0.83      0.83         6

    accuracy                           0.93        30
   macro avg       0.90      0.90      0.90        30
weighted avg       0.93      0.93      0.93        30

Accuracy: 0.9333333333333333
SVC(kernel='linear', random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

:::

Use embeddings to Impute

                                              filename Mice Name  \
0    /kaggle/working/data2/CH1/Chc20_RR9_Ret_20X_PN...     Chc20   
1    /kaggle/working/data2/CH1/Chc20_RR9_Ret_20X_PN...     Chc20   
2    /kaggle/working/data2/CH1/Chc20_RR9_Ret_20X_PN...     Chc20   
3    /kaggle/working/data2/CH1/Chc20_RR9_Ret_20X_PN...     Chc20   
4    /kaggle/working/data2/CH1/Chc20_RR9_Ret_20X_PN...     Chc20   
..                                                 ...       ...   
178  /kaggle/working/data2/CH1/VG20_RR9_Ret_20X_PNA...     CC220   
179  /kaggle/working/data2/CH1/VG20_RR9_Ret_20X_PNA...     CC220   
180  /kaggle/working/data2/CH1/VG20_RR9_Ret_20X_PNA...     CC220   
181  /kaggle/working/data2/CH1/VG20_RR9_Ret_20X_PNA...     CC220   
182  /kaggle/working/data2/CH1/VG20_RR9_Ret_20X_PNA...     CC220   

    Staining Technique Channel  
0                  Ret     CH1  
1                  Ret     CH1  
2                  Ret     CH1  
3                  Ret     CH1  
4                  Ret     CH1  
..                 ...     ...  
178                Ret     CH1  
179                Ret     CH1  
180                Ret     CH1  
181                Ret     CH1  
182                Ret     CH1  

[183 rows x 4 columns]
/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:337: RuntimeWarning:

divide by zero encountered in matmul

/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:337: RuntimeWarning:

overflow encountered in matmul

/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:337: RuntimeWarning:

invalid value encountered in matmul

/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:338: RuntimeWarning:

divide by zero encountered in matmul

/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:338: RuntimeWarning:

overflow encountered in matmul

/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:338: RuntimeWarning:

invalid value encountered in matmul

/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:342: RuntimeWarning:

divide by zero encountered in matmul

/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:342: RuntimeWarning:

overflow encountered in matmul

/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:342: RuntimeWarning:

invalid value encountered in matmul

/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:529: RuntimeWarning:

divide by zero encountered in matmul

/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:529: RuntimeWarning:

overflow encountered in matmul

/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:529: RuntimeWarning:

invalid value encountered in matmul

/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:543: RuntimeWarning:

divide by zero encountered in matmul

/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:543: RuntimeWarning:

overflow encountered in matmul

/Users/vaishnavinagesh/Desktop/AI-ML_AWG/.venv/lib/python3.12/site-packages/sklearn/utils/extmath.py:543: RuntimeWarning:

invalid value encountered in matmul